Model Selection

Knowledge Distillation BERT

# Knowledge Distillation BERT

Prunedbert L12 H256 A4 Finetuned

A lightweight model based on the BERT architecture, pre-trained using knowledge distillation techniques, with a hidden layer dimension of 256 and 4 attention heads.

Large Language Model

Bert L12 H384 A6

A lightweight BERT model pre-trained on the BookCorpus dataset using knowledge distillation technology, with the hidden layer dimension reduced to 384 and 6 attention heads.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase